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ABSTRACT 

The design of medium access control protocols for a cogni- 
tive user wishing to opportunistically exploit frequency bands 
within parts of the radio spectrum having multiple bands is 
considered. In the scenario under consideration, the avail- 
ability probability of each channel is unknown a priori to the 
cognitive user. Hence efficient medium access strategies must 
strike a balance between exploring the availability of channels 
and exploiting the opportunities identified thus far. Using a 
sequential design approach, an optimal medium access strat- 
egy is derived. To avoid the prohibitive computational com- 
plexity of this optimal strategy, a low complexity asymptoti- 
cally optimal strategy is also developed. The proposed strat- 
egy does not require any prior statistical knowledge about the 
traffic pattern on the different channels. 

Index Terms — Cognitive radio, bandit problem, medium 
access control. 

1. INTRODUCTION 

As a promising technique to increase spectral efficiency of 
overcrowded parts of the radio spectrum, the opportunistic 
spectrum access problem has been the focus of significant 
research activities [1]. The underlying idea is to allow un- 
licensed users (i.e., cognitive users) to access the available 
spectrum when the licensed users (i.e., primary users) are not 
active. The presence of high priority primary users and the 
requirement that the cognitive users should not interfere with 
them introduce new challenges for protocol design. The over- 
arching goal of the current work is to develop a unified frame- 
work for the design of efficient, and low complexity, cognitive 
medium access protocols. 

The spectral opportunities available to cognitive users are 
by their nature time-varying. To avoid interfering with the 
primary network, cognitive users must first probe to deter- 
mine whether there are primary activities before transmission. 
Under the assumption that each cognitive user cannot access 
all of the available channels simultaneously, the main task of 
the medium access protocol is to distributively choose which 
channels each cognitive user should attempt to use in different 
time slots, in order to fully (or maximally) utilize the spectral 



This research was supported by the National Science Foundation under 
Grants ANI-03-38807 and CNS-06-25637. 



opportunities. This decision process can be enhanced by tak- 
ing into account any available statistical information about the 
primary traffic. For example, with a single cognitive user ca- 
pable of accessing (sensing) only one channel at a time, the 
problem becomes trivial if the probability that each channel is 
free is known a priori. In this case, the optimal rule is for the 
cognitive user to access the channel with the highest probabil- 
ity of being free in all time slots. However, such time- varying 
traffic information is typically not available to the cognitive 
users a priori. The need to learn this information on-line 
creates a fundamental tradeoff between exploitation and ex- 
ploration. Exploitation refers to the short-term gain resulting 
from accessing the channel with the estimated highest proba- 
bility of being free (based on the results of previous sensing 
decisions) whereas exploration is the process by which a cog- 
nitive user learns the statistical behavior of the primary traf- 
fic (by choosing possibly different channels to probe across 
time slots). In the presence of multiple cognitive users, the 
medium access algorithm must also account for the competi- 
tion between different users over the same channel. 

In this paper, we develop a unified framework for the de- 
sign and analysis of cognitive medium access protocols in the 
presence of a single cognitive user who can access a single 
channel in each time slot. As argued in the sequel, this frame- 
work allows for the construction of strategies that strike an 
optimal balance between exploration and exploitation. We 
derive an optimal sensing rule that maximizes the expected 
throughput obtained by the cognitive user. Compared with a 
genie-aided scheme, in which the cognitive user knows a pri- 
ori the primary network traffic information, there is a through- 
put loss suffered by any medium access strategy. We obtain a 
lower bound on this loss and further construct a linear com- 
plexity single index protocol that achieves this lower bound 
asymptotically (when the primary traffic behavior changes 
slowly). Similar approaches have been considered in [3] and [4], 
but with different emphases. 

We have also extended our study to networks with multi- 
ple cognitive users and networks with more capable cognitive 
users, and have developed optimal strategies for these sce- 
narios. However, due to space limitations, we do not discuss 
these results here. We also omit the proofs of results presented 
in this paper. Interested readers can refer to [5] for details. 

The rest of this paper is organized as follows. Our net- 
work model is detailed in Section |2] Section [3] develops and 
analyzes an optimal strategy for the single cognitive user see- 



nario. Finally, Section |4] summarizes our conclusions. 

2. NETWORK MODEL 

Figure [T| shows the channel model of interest. We consider 
a primary network consisting of N non-overlapping chan- 
nels, Af = {1, • ■ ■ : -^}' sach with bandwidth B. The users 
in the primary network are operated in a synchronous time- 
slotted fashion. We assume that at each time slot, channel i 
is free with probability 9i. Let Zi{j) be a random variable 
that equals 1 if channel i is free at time slot j and equals 
otherwise. Hence, given 9i, Zi{j) is a Bernoulli random vari- 
able with distribution he,{zi{j)) = 9iS{l) + (1 - 0i)5{0), 
where 6{-) is a delta function. Furthermore, for a given 6 — 
{6i, - ■ ■ , On), the Zi{j) are independent for each i and j. We 
consider a block varying model in which the value of is 
fixed for a block of T time slots and then randomly changes 
at the beginning of the next block according to a joint proba- 
bility density function (pdf) f{6). 

t=1 t=T 
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Spectrum opportunities 



Fig. 1. Channel model. 

In our model, the cognitive users attempt to exploit the 
availability of free channels in the primary network by sens- 
ing the activity at the beginning of each time slot. Our work 
seeks to characterize efficient strategies for choosing which 
channels to sense (access). The challenge here stems from 
the fact that the cognitive users are assumed to be unaware of 
9 a priori. We consider two cases in which a cognitive user 
either has or does not have prior information about the pdf of 

0, i.e., f{6). In the scenario presented in this paper, at time 
slot j, a single cognitive user selects one channel S{i) G A/" to 
access. If the sensing result shows that channel S{i) is free, 

1. e., ZsQ)(j) = 1, the cognitive user can send B bits over 
this channel; otherwise, the cognitive user will wait until the 
next time slot and pick a possibly different channel to access. 
Therefore, the total number of bits that the cognitive user is 
able to send over one block (of T time slots) is 

T 

It is clear that is a random variable that depends on 
the traffic in the primary network and, more importantly for 
us, the medium access protocols employed by the cognitive 
user. Therefore, the overarching goal of this paper is to con- 
struct low complexity medium access protocols that maxi- 
mize E{VK}. 



Intuitively, the cognitive user would like to select the chan- 
nel with the highest probability of being free in order to ob- 
tain more transmission opportunities. If 6 is known then this 
problem is trivial: the cognitive user should choose the chan- 
nel i* — argmax^^i to sense. The uncertainty in 6 imposes 

a fundamental tradeoff between exploration, in order to learn 
6, and exploitation, by accessing the channel with the highest 
estimated free probability based on current available informa- 
tion, as detailed in the following section. 

3. OPTIMAL MEDIUM ACCESS PROTOCOLS 

We start by developing the optimal solution under the ide- 
alized assumption that f{0) is known a priori by the cog- 
nitive user. As we will see, this optimal medium access al- 
gorithm suffers from a prohibitive computational complexity 
that grows exponentially with the block length T. This mo- 
tivates the design of low complexity asymptotically optimal 
approaches, which we also consider. 

Our cognitive medium access problem belongs to the class 
of bandit problems. In this setting, the decision maker must 
sequentially choose one process to observe from N > 2 stochas- 
tic processes. These processes usually have parameters that 
are unknown to the decision maker and, associated with each 
observation is a utility function. The objective of the deci- 
sion maker is to maximize the sum or discounted sum of the 
utilities via a strategy that specifies which process to observe 
for every possible history of selections and observations. A 
comprehensive treatment covering different variants of ban- 
dit problems can be found in [2]. 

We are now ready to rigorously formulate our problem. 
The cognitive user employs a medium access strategy F, which 
will select channel S{j) E J\f to sense at time slot j for any 
possible causal information pattern obtained through the pre- 
vious j — 1 observations: ^(j) = {«(!), Zg(^i^{l), • • ■ , s(j — 
> 2,/.e. s{j) ^ r(/,*(j)). Notice that 
^s{j) (j) is the sensing outcome of the j*^ time slot, in which 
s(j) is the channel being accessed. If j = 1, there is no ac- 
cumulated information, and thus ^'(1) = 4> and s(l) — r(/). 
The utility that the cognitive user obtains by making decision 
S{j) at time slot j is the number of bits it can transmit at time 
slot j, which is BZs(j) {])■ We denote the expected value of 
the payoff obtained by a cognitive user who uses strategy F 
as 

W'r=E/|E^^so)0-)|- (1) 

We further denote V* (/, T) = sup Wt, which is the largest 

r 

throughput that the cognitive user could obtain when the spec- 
tral opportunities are governed by f{9) and the exact value of 
each realization of is not known a priori by the user 

Each medium access decision made by the cognitive user 
has two effects. The first one is the short-term gain, i.e., an 
immediate transmission opportunity if the chosen channel is 
found free. The second one is the long-term gain, i.e., the 
updated statistical information about f{0). This information 



will help the cognitive user in making better decisions in fu- 
ture stages. There is an interesting tradeoff between the short 
and long-term gains. If we only want to maximize the short- 
term gain, we can choose the channel with the highest avail- 
ability probability to sense, based on the current information. 
This myopic strategy maximally exploits the existing infor- 
mation. On the other hand, by choosing other channels to 
sense, we gain statistical information about f{9) which can 
effectively guide future decisions. This process is typically 
referred to as exploration, as noted previously. 

More specifically, let (6) be the updated pdf after mak- 
ing j — 1 observations. We begin with f^{9) = f{6). Af- 
ter observing Zg(^j) [j], we update the pdf using the following 
Bayesian formula. 

1. ifz,^,){3) = i,p+\e)-^^i^^ 



2. Ifz,(,)(j)=O,P"+i(0) 



(i-g,(,))/^(e) 



The following result characterizes the optimal strategy that 
maximizes the average throughput the cognitive user obtains 
from the network. 

Lemma 1 For any prior pdf f, there exists an optimal strat- 
egy r* to the channel selection problem ([U, and V*{f, T) is 
achievable. Moreover, V* satisfies the following condition: 

V*{f,T) - ^max^E/ {SZ.(i) {fz,,,,,T ^ l)] , (2) 

where fz^^i) conditional distribution updated using the 

Bayesian rule described above, as if the cognitive user chooses 
s(l) and observes Zgi^iy Also, V* (/z,(i) , 2^ ~ l) is the value 
of a bandit problem with prior information /z,(i) <^nd T — 1 
sequential observations. 

In principle, Lemma[T]provides the solution to problem 
Effectively, it decouples the calculation at each stage, and 
hence, allows the use of dynamic programming to solve the 
problem. The idea is to solve the channel selection problem 
with a smaller dimension first and then use backward deduc- 
tion to obtain the optimal solution for a problem with a larger 
dimension. Starting with T — \, the second term inside the 
expectation in (|2]i is 0, since T— 1 = 0. Hence, the optimal so- 
lution is to choose the channel i having the largest E/jSZ;}, 
which can be calculated as E/{i?Zi} = i? / 9if{9)d0. And 
V*{f, 1) = maxE/{BZ,}. With the solution for T = 1 at 

hand, we can now solve the T = 2 case using (|2]i. At first, 
for every possible choice of s(l) and possible observation 
Zs(i), we calculate the updated distribution /z^^, using the 
Bayesian formula. Next, we calculate V*{fz^^-^y 1) (which is 
equivalent to the T = 1 problem described above). Finally, 
applying (|2]l, we have the following equation for the channel 
selection problems with T = 2: 



V*{f,2) 



max 



+{i-9,)v*{f,^=„,i)]f{e)de. 



Hence, in the first step, the cognitive user should choose i* (1) 
argmaxF*(/, 2) to sense. After observing Zi*(^iy the cogni- 
tive user has ^(1) — and it should choose i*{2) = 



arginaxF*(/z.,jj^j, 1). Similarly, after solving the T = 2 

problem, one can proceed to solve the T — 3 case. Using this 
procedure recursively, we can solve the problem with T — 1 
observations. Finally, our original problem with T observa- 
tions is solved as follows. 

V*{f,T) = max / [BO, + e,V*{f,,=i,T - I) 

+{i-ei)v*{f,^^^,T-i)]f{e)dd. 

The optimal solution developed above suffers from a pro- 
hibitive computational complexity. In particular, the dimen- 
sionality of our search dimension grows exponentially with 
the block length T. Moreover, one can envision many prac- 
tical scenarios in which it would be difficult for the cogni- 
tive user to obtain the prior information f{6). This moti- 
vates our pursuit of low complexity non-parametric protocols 
which maintain certain optimality properties and do not de- 
pend on f{6) explicitly. Hence, in the following, we aim to 
develop strategies that depend only on the information ob- 
tained through observations ^I*. 

For a given strategy F, the expected number of bits the 
cognitive user is able to transmit through a block with given 
parameters is 

n 1 T JV 



i=i 



Recall that T{'^{i)) = i means that, following strategy 
r, the cognitive user should choose channel i in time slot j, 
based on the available information Here Pr {r(\I'(j)) = i} 
is the probability that the cognitive user will choose channel i 
at time slot j, following the strategy F. 

Compared with the idealistic case where the exact value 
of 6 is known, in which the optimal strategy for the cognitive 
user is to always choose the channel with the largest availabil- 
ity probability, the loss incurred by F is given by 

T T N 

L{e- T)^Y. - E ^ E ^'P'" = n , 



where Oi- = max{(?i, • • • ,9^}- We say that a strategy F 
is consistent if, for any G [0, 1]^, there exists /3 < 1 
such that L(0;F) scales as 0{T^). In the sequel, we use 
the following notations 1) gi{N) ^ uj{g2{N)) means that 
Vc > Q,3No, such that Viv > A^o,S2(A^) < cgi{N); 2) 
gi{n) — 0{g2{N)) means that 3ci, C2 > and A^'o, such that 
VA^ > iVo, cig2{N) < gi{N) < C2g2{.N). For example, 
consider a loyal scheme in which the cognitive user selects 
channel i at the beginning of a block and sticks to it. If 9i is 
the largest one among 0, L{9; F) = 0. On the other hand, if 
9i is not the largest one, L{9; F) ^ 0{T). Hence, this loyal 
scheme is not consistent. The following lemma characterizes 
the fundamental limits of any consistent scheme. 



Lemma 2 For any 6 and any consistent strategy F, we have 

(3) 



lim inf ^^^^ > B 



T-,oo InT 



where D{9i\\9i) denotes the Kullback-Leibler divergence be- 
tween the two Bernoulli random variables with parameters 9i 

and Oi respectively: DiO^WOi) = 6*, In +(1-6*4) In (^iff^^ 

Lemma|2]shows that the loss of any consistent strategy scales 
at least as a;(lnr). An intuitive explanation of this loss is 
that we need to spend at least 0(ln T) time slots on sampling 
each of the channels with smaller 6i, in order to get a reason- 
ably accurate estimate of 0, and hence use it to determine the 
channel having the largest 6i to sense. We say that a strategy 
r is order optimal if T) - 0(ln T). 

Before proceeding to the proposed low complexity order- 
optimal strategy, we first analyze the loss order of some heuris- 
tic strategies which may appear to be reasonable. 

The first simple rule is the random strategy where, at 
each time slot, the cognitive user randomly chooses a channel 
from the available N channels. The fraction of time the cog- 
nitive user spends on each channel is therefore 1 /N, leading 

JV 

to the loss Tr) = ^ -T - 0{T). 

The second one is the myopic rule Tg in which the cogni- 
tive user keeps updating [0], and chooses the channel with 
the largest value of 9i ~ J 9if^{6)d6 at each stage. Since 
there are no convergence guarantees for the myopic rule, that 
is may never converge to due to the lack of sufficiently 
many samples for each channel [6], the loss of this myopic 
strategy is 0(T). 

The third protocol we consider is staying with the winner 
and switching from the loser rule T sw where the cognitive 
user randomly chooses a channel in the first time slot. In the 
succeeding time-slots 1) if the accessed channel was found to 
be free, it will choose the same channel to sense; 2) other- 
wise, it will choose one of the remaining channels based on a 
certain switching rule. 

Lemma 3 No matter what the switching rule is, L{9; Tsw) 
0(T). 

There are several strategies that have loss of order O(lnr). 
We adopt the following linear complexity strategy from [7]. 

Rule 1 ( Order optimal single index strategy) 

The cognitive user maintains two vectors X and Y, where 
each Xi records the number of time slots in which the cogni- 
tive user has sensed channel i to be free, and each Yi records 
the number of time slots in which the cognitive user has cho- 
sen channel i to sense. The strategy works as follows. 

1. Initialization: at the beginning of each block, each chan- 
nel is sensed once. 

2. After the initialization period, the cognitive user ob- 
tains an estimate at the beginning of time slot j, given 
by = Xi{j)/Yi{j), and assigns an index Ai{j) — 
0i{j) + a/2 In to the i*^ channel. The cognitive 
user chooses the channel with the largest value of Ki (j ) 
to sense at time slot j. After each sensing, the cognitive 
user updates X and Y. 



The intuition behind this strategy is that as long as Yi 
grows as fast as O(lnT), converges to the true value of 
6i in probability, and the cognitive user will choose the chan- 
nel with the largest Ot eventually. The loss of O(lnr) comes 
from the time spent in sampling the inferior channels in order 
to learn the value of 9. This price, however, is inevitable as 
established in the lower bound of Lemma|2l 

Finally, we observe that the difference between the my- 
opic rule and the order optimal single index rule is the ad- 
ditional term 2\ti j / Yi{i) added to the current estimate 9i. 
Roughly speaking, this additional term guarantees enough sam- 
pling time for each channel, since if we sample channel i too 
sparsely, Yi{i) will be small, which will increase the proba- 
bility that Ai is the largest index. When Yi{j) scales as InT, 
9i will be the dominant term in the index A^, and hence the 
channel with the largest 9i will be chosen much more fre- 
quently. 

4. CONCLUSIONS 

This work has developed a unified framework for the design 
and analysis of cognitive medium access based on the classi- 
cal bandit problem. Our formulation highlights the tradeoff 
between exploration and exploitation in cognitive channel se- 
lection. A linear complexity cognitive medium access algo- 
rithm, which is asymptotically optimal as the number of time 
slots increases, has also been proposed. 
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